{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 生成中间文件\n",
    "\n",
    "本代码的目的是生成三个中间文件，其中hot_map是统计的商品的出现次数，upwardmap与downward_map是将商品id映射到实数集\\[0, m\\]，其中m代表商品总数。\n",
    "\n",
    "This code aims to generate three temporary files. Hot_map statistics the number of appearance of each item. Upward_map and Downward_map can map the ItemID into \\[0, m\\], where m indicates the size of item corpus."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "# round2 train的路径\n",
    "path = '../ECommAI_EUIR_round2_train_20190816/'\n",
    "data = pd.read_csv(path + 'user_behavior.csv',header=None)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "data.columns = ['userID','itemID','behavior','timestamp']\n",
    "data['day'] = data['timestamp'] // 86400\n",
    "data['hour'] = data['timestamp'] // 3600 % 24"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_times = data[['itemID','userID']].groupby('userID', as_index=False).count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_times.columns = ['userID','itemCount']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_times_map = dict(zip(user_times['userID'], user_times['itemCount']))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1396951"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(user_times_map)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "f = open('usersActivity_map.txt', 'w')\n",
    "f.write(str(user_times_map))\n",
    "f.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "生成upward_map 与 downward_map:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn import preprocessing\n",
    "le = preprocessing.LabelEncoder()\n",
    "item['encoding'] = le.fit_transform(item['itemID'])\n",
    "\n",
    "upward_map = dict(zip(item['itemID'], item['encoding']))\n",
    "downward_map = dict(zip(item['encoding'], item['itemID']))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "生成hot table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "temp = data[['itemID','behavior']].groupby('itemID',as_index=False).count()\n",
    "hot_map = dict(zip(temp['itemID'], temp['behavior']))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def save_to_file(trans_map, file_path):\n",
    "    trans_map = str(trans_map)\n",
    "    f = open(file_path, 'w')\n",
    "    f.write(trans_map)\n",
    "    f.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "save_to_file(hot_map,'hot_items_map.txt')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "save_to_file(upward_map,'upward_map.txt')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "save_to_file(downward_map,'downward_map.txt')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
